Interpreting Association from Graphical Displays 

Noleine Fitzallen 
University of Tasmania 
<N oleine. Fitzallen @ utas. edu.au > 

Research that has explored students’ interpretations of graphical representations has not 
extended to include how students apply understanding of particular statistical concepts 
related to one graphical representation to interpret different representations. This paper 
reports on the way in which students’ understanding of covariation, evidenced through their 
interpretation of scatterplots, was applied to the interpretation of split stacked dot plots. The 
outcomes of the study suggest that incomplete understanding of the characteristics of a 
graph and the data displayed can lead to students applying knowledge of statistical concepts 
relevant to one graph type to misinterpret a different graph type. 

In recent times, the research agenda in statistics education research has focused on the 
nature and development of statistical literacy, reasoning, and thinking (Garfield & Ben- 
Zvi, 2004) with particular attention given to the key statistical concept of infonnal 
inference (e.g., Rubin, Hammerman & Konold, 2006; Watson & Donne, 2008; Watson & 
Wright, 2008). Much of the research has been conducted within the context of the learning 
environments afforded by new technologies (e.g., Paparistodemou & Meletiou- 
Mavrotheris, 2008) but little attention has been given to how students translate their 
knowledge and understanding of one particular graphical representation to another. For the 
most part, that research has focused on one particular graphical representation to explore 
the development of particular key statistical concepts. For example, Watson and Donne 
explored students’ use of hat plots when making informal inferences, and Rubin et al. 
explored the sorting of data into bins and the display of visual and numerical information 
simultaneously when comparing groups. A number of studies, however, have given 
students the freedom to construct multiple representations to make sense of the data (e.g., 
Rubin et al.). Although favourable outcomes have been reported, Rubin and her colleagues 
and Bakker (2002) warn that providing open access to the full suite of features available in 
some graphing software programs may be overwhelming and distracting for some students. 
Given the ability that these new technologies have to produce non-traditional graphical 
representations in conjunction with traditional graphical representations (Watson & 
Fitzallen, 2016), it is worthwhile investigating the way in which students use their 
understanding of statistical concepts developed using one particular graphical 
representation to the interpretation of other graphical representations. 

Interpreting Graphical Representations 

Graphical representations have many characteristics that can be used when interpreting 
the data displayed. The characteristics, such as the mode, scale of an axis, or the variation 
in the spread of the data can be extracted directly from the graph (Roth, Pozzer-Ardenghi, 
& Han, 2005) or from calculations perfonned by graphing software (Watson & Fitzallen, 
2016), such as TinkerPlots Dynamic Data Exploration (Konold & Miller, 2011). An 
understanding of the context and the nature of the variables of interest may be gained from 
personal experiences of the context, from information about the data, or from the details 
embedded in the scales and frameworks of the graphs (Roth et al., 2005; Watson & 
Fitzallen, 2016). Collectively, the elements of graphs are resources that provide a link 
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between the visual two-dimensional representations and the real world measurement 
situations; relevant to this study are scatterplots and split stacked dot plots. 

Scatterplots and Split Stacked Dot Plots 

Developed in the 1800s, a scatterplot is a graphical technique used to display paired 
measurements of two quantitative variables and used to explore the relationship between 
two numerical attributes. They are characterised by data points that correspond to the 
measures of two variables designated at the same time on a Cartesian graph (Moritz, 2004). 
Each data point on a scatterplot corresponds to one unit of analysis between the two 
variables and the values of the two variables may be said to involve some fonn of 
relationship, association, function, dependency, or correspondence (Cobb, McClain, & 
Gravemeijer, 2003; Moritz; Zieffler & Garfield, 2009). Scatterplots have numerical 
attributes on both axes of a graph and are used to display covariation (Moritz, 2004). 

Split stacked dot plots are distinctly different to scatterplots. They are non- traditional 
graphs that are easy to construct using interactive graphing software, such as TinkerPlots. 
These graphs display a numerical attribute on one axis and a categorical attribute on the 
other axis, which facilitates a comparison of categories or multiple data sets for the one 
numerical attribute. Like scatterplots, split stacked dot plots can be used to display 
association. They are, however, used primarily to make comparisons between groups and 
support the analysis of data that go beyond direct comparisons (Watson & Wright, 2008). 
It is, however, the direct comparisons of the visual characteristics of the plots that students 
are able to use to determine if there is an association between the two attributes displayed. 

Two examples of split stacked dot plots are provided in Figure 1. Added to the graphs 
are hat plots that divide the distribution of the data into three sections. The hat plot 
resembles a hat and is made up of two main components. The crown of the hat is a 
rectangle that shows the middle 50% of the data and the brim of the hat is a line that 
extends across the full range of the data set. The lower 25% of the data is represented by 
the line to the left of the crown and the upper 25% of the data is represented by the line to 
the right of the crown. A hat plot can only be applied when one or more of the axes of a 
graph have a continuous scale. The graph on the left shows there is an association, thereby 
a relationship, between the gender and height of adults. This is detennined by the distinct 
differences in height of the two groups — the ranges of the crowns of the hat plots and the 
means (denoted by the A symbol) are different for the genders. The graph on the right, 
shows there is no association, therefore no relationship, between the gender and height of 
children. This is determined by the similarity of the height of the two groups — the ranges 
of the hat plots and the means are essentially the same for both genders. 
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Figure 1. Stacked dot plots. 
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Distinguishing Between Covariation and Association 

Although related, covariation of numerical attributes is a distinctly different form of 
association from that seen in Figure 1. However, the definition of covariation is often 
included in the definition for association (Batanero, Estepa, Godino, & Green, 1996; 
Moritz, 2004; Zieffler & Garfield, 2009). Batanero and her colleagues describe covariation 
as a fonn of association. They note that association is “the analysis of contingency tables, 
the determination of correlation between quantitative variables, and the comparison of a 
numerical variable in two or more samples” (p. 151). Covariation is “[Reasoning about 
association (or relationship) between two variables, also referred to as covariational 
reasoning, or reasoning about bivariate data, involves knowing how to judge and interpret 
a relationship between two variables (Zieffler & Garfield, p. 7). Similarly, Moritz suggests, 
“[cjovariation concerns association of variables; that is, correspondence of variation. 
Reasoning about covariation commonly involves translation processes among raw 
numerical data, graphical representations, and verbal statements about statistical 
covariation and causal association” (p. 227). Moritz also adds, “The more general term 
statistical association may also refer to associations between two categorical variables, 
commonly represented in two-way frequency tables, and between one categorical and one 
interval variable, often formulated as the comparison of groups” (p. 228). 

The Study 

The research reported in this paper is part of a study that explored students’ use of 
TinkerPlots Dynamic Data Exploration (Konold & Miller, 201 1) to construct and interpret 
graphical representations. It involved 12 Year 5/6 students (11-12 years old) students 
worked in pairs with the teacher/researcher (45 minutes, twice a week for 6 weeks) through 
a sequence of learning experiences designed to provide them with the opportunity to 
develop an understanding of various statistical concepts and graphical representations 
using TinkerPlots. The learning sequence included activities related to distribution, 
variation, and increasing sample size as well as the construction of dot plots, bar graphs, 
value bar graphs, scatterplots, and stacked dot plots. At the end of the learning sequence 
the students were interviewed individually as they used TinkerPlots to create various 
graphical representations of their choice to show the relationship between two attributes 
from the data set provided. The data set included both categorical and numerical data. The 
activities were set up in TinkerPlots as an interview protocol and were designed to provide 
the opportunity for the students to demonstrate what they had learned during the sequence 
of learning experiences. On screen capture video was used to record the students’ actions 
as they used TinkerPlots. The video also captured audio recordings of the students’ 
explanations of actions taken and responses to questions posed by the teacher/researcher. 
Prior to the analysis of the data for this paper the video data were analysed to determine 
their level of understanding of covariation. The results of that analysis provide reference 
points from which the analysis of the data for this paper is discussed. 

Students ’ Level of Understanding of Covariation 

Paparistodemou and Meletiou-Mavrotheris (2008) contend that TinkerPlots enhanced 
young students’ opportunities to find relationships between two variables in the data and to 
draw conclusions from the data. In this study, the students’ statements, descriptions, and 
justifications about the relationships seen in the graphs were analysed to evidence the 
complexity of the responses and the level of understanding attained. 
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To determine the students’ level of understanding of covariation, the data were 
analysed according to the level demonstrated according to the SOLO taxonomy (Biggs & 
Collis, 1982). Of the 12 students interviewed, six students’ responses were uni-structural, 
three students’ responses were multi-structural, and three students’ responses were 
relational (Table 1). At the uni-structural level the students made statements that often 
involved a declaration that there was or was not a trend evident in a scatterplot but little or 
no justification or reasoning was offered to explain how they made the judgement. At the 
multi-structural level, the students used multiple characteristics of scatterplots to explain 
and justify their thinking as they described the covariation identified. Students who 
demonstrated understanding at the relational level also used multiple characteristics of 
graphs to make their decisions. They also went further to identify the variation in the graph 
that did not meet their expectations for a relationship to exist (Fitzallen, 2012). 

Table 1. 

Students’ Achievement for Covariation According to the Levels of the SOLO Framework 


Uni-structural 

Multi-structural 

Relational 

Jake, Natasha, Rory, 

Shaun, Blaire, 

James, William, 

Johnty, Natalie, Kimberley 

Jessica 

Mitchell 


Data Analysis 

The audio data from the video recordings were transcribed verbatim and descriptions 
of the students’ actions observed on the videos were added to the transcribed data. Content 
analysis of the transcripts (Miles & Huberman, 2003) involved analysing the transcripts 
line-by-line to code the responses and actions according to the four dimensions of the 
Graphing in EDA Software Environments framework — Generic [ICT] knowledge, Being 
creative with data, Understanding data, Thinking about data (Fitzallen, 2012). After this 
initial categorisation, analysis of the grouped data was dominated by, but not restricted to, 
the data coded for the dimensions Understanding data and Thinking about data. That 
analysis focused initially on detennining the students’ level of understanding of 
covariation (Fitzallen, 2012). A second round of data analysis was undertaken to detennine 
the way in which the students constructed and interpreted split stacked dot plots, 
particularly in relation to their selection for showing the relationship between two 
attributes. The data from the second data analysis iteration is reported in this paper and 
discussed in relation to the results of the first iteration of data analysis. 

Results 

Students ’ Interpretation of Split Stacked Dot Plots 

As the students worked through the activities and questions in the interview protocol, 
they constructed a variety of graphs, including scatterplots and split stacked dot plots. An 
example of the graphs created is provided in Figure 2. The final task in the interview 
protocol set up in TinkerPlots required the students to look at all the graphs created during 
the session and detennine which graph showed the strongest relationship between two 
attributes. It was anticipated that the students would select scatterplots that displayed 
covariation. Unexpectedly, of the 10 students that completed this task, seven indicated that 
split stacked dot plots were the ones that displayed the strongest relationship between two 
attributes (see Table 2). Jessica and Kimberley did not make a contribution to the 
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following results as they ran out of time and did not complete the final question on the 
interview protocol. 



Figure 2. Graphs constructed by Mitchell using the interview protocol. 


Table 2. 

Students ’ Selection of Graphical Representations to Display the relationship Between Two 
Attributes 


Split stacked dot plots 

Scatterplots 

Blaire, Jake, Johnty, Natalie, 
Mitchell, Rory, Shaun 

Natasha, James, William 


All of the students had examples of scatterplots and split stacked dot plots similar to 
those in Figure 2 from which to choose. The seven students who chose split stacked dot 
plots selected graphs that displayed the association between one numerical attribute and 
one categorical attribute. All of the split stacked dot plots created by the group of students 
were similar but varied somewhat. An example is provided in Graph 1 in Figure 2. The 
students individualised their graphs by accessing different features of TinkerPlots, such as 
the mean, hat plots or reference lines. Regardless of the attributes chosen, the features 
added and the scale of the axes selected for the split stacked dot plots, each of the students 
indicated that it was the similarity or closeness of the information from the visual features 
in each section of the graphs that influenced their decisions. As can be seen in the split 
stacked plot in Graph 1 in Figure 2, the mean for the males and females is the same and the 
range of the crown of the hat is also very similar for both genders. When making the 
decision about the relationship between two attributes from this type of graph, the students 
gave little attention to the distribution of the data across the outer reaches of the hat brim. 
On occasions, the overall range of the data for each group was considered but students only 
mentioned the range of the data when there was a large difference in the ranges. 

Johnty, Rory, and Blaire chose split stacked dot plots that displayed the association 
between gender and height to be the one that showed the strongest relationship between the 
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two attributes chosen. Blaire used the mean to make her decision. Blaire said she was 
confident it was the “best” graph because the mean height was the same for both genders. 
Johnty also used the mean to draw his conclusion but justified his decision further by 
comparing the crowns of the hat plots. As Rory made his choice he said, “[t]he hat plots 
[are] exactly the same. So is the average.” 

Jake and Natalie also used the crowns of the hat plots to justify their choices of a split 
stacked dot plot that displayed gender and foot length. Adding to Natalie’s confidence was 
the distribution of the data. She said, “It is not as spread out [pointing to a split stacked dot 
plot]. This one is not as neat and you don’t know how many dots there are [pointing to a 
scatterplot].” Shaun also selected a graph with gender and belly button height and said, 
“They’ve got the same amount of people like under the [hat] roughly about 50% and 
roughly about the same . . . outside the 50%. ” 

The four graphs in Figure 2 were constructed by Mitchell. They are typical of the 
graphical representations the students constructed during the student interview. From the 
selection of graphs, Mitchell asserted that Graph 1 showed the strongest relationship 
between height and gender. Mitchell detennined that the height of the males and females 
were the same by comparing the two groups using the mean, the position of the crown of 
the hat, and the spread of the data. He equated the closeness of these characteristics of the 
data to infer there was a strong relationship evident in the graph. By convention, 
interpretation of this graph would show there was no relationship or association between 
gender and height. 

William, Natasha, and James selected scatterplots that displayed the relationship 
between belly button height and height to be the ones that showed the strongest 
relationship between two attributes. In all three cases, the decision was based on the trend 
evident in the data. For example, when William pointed to the scatterplot with height and 
belly button height displayed, he said: “Umm . . . probably this one, because umm, it shows 
us that the strongest, was umm, it shows you that the height, depending on where it is, 
chances are that that’s where the bigger belly button height is, probably.” Like the other 
seven students, these three students had split stacked dot plots showing no association in 
their collection of graphs but chose to focus their decision making on the scatterplots. 

Discussion 

It is customary to use scatterplots to detennine if there is a relationship between two 
numerical attributes (Moritz, 2004). It was within the context of scatterplots that students 
demonstrated their understanding of covariation, which involves reasoning about the 
relationship displayed in the data. In doing so, they identified the trend and, in some cases, 
were able to describe the way in which one of the attributes increased in much the same 
way and in conjunction with the other attribute (Fitzallen, 2012). Although split stacked 
dot plots may display an association, thereby the relationship, between two attributes, 
different interpretation strategies than those used to interpret scatterplots are needed. 
Analysis of the results presented in this paper suggests the students transferred their naive 
understanding of covariation to interpret split stacked dot plots. Their interpretations of the 
split stacked dot plots were based on the way in which the data were the same for each 
gender. With these types of graphs, the “sameness” of the data is an indication that there is 
no relationship between the two attributes. With scatterplots, the idea that the attributes are 
behaving in the same way is an indication that there is a relationship between the two 
attributes. 
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The decision by seven of the students to use split stacked dot plots that displayed the 
association between two attributes to show the relationship between one numerical 
attribute and one categorical attribute suggests that they did not appreciate the limitations 
of the data or the graphical representation chosen. The association they identified due to 
the similarity of the visual representations suggests that some of the students had 
transferred their understanding established within the context of scatterplots to the context 
of split stacked dot plots. This revealed that the students had not established fully an 
understanding of the purpose of the two different graphical representations and the 
meaning they embodied. 

Of the seven students, five demonstrated the lowest level of understanding of 
covariation, that is, the uni-structural level. Only Natasha, demonstrated a higher level of 
understanding of covariation (relational) than demonstrated previously when she chose a 
scatterplot as the one that showed the strongest relationship between two attributes. 
Conversely, Mitchell performed at the highest level of understanding for covariation 
(relational) previously and then transferred this understanding to the interpretation of a 
split stacked dot plot. It appears Natasha’s and Mitchell’s levels of understanding of 
covariation were not stable. Had the question about making a choice from a selection of 
self-generated graphs not be included in the interview protocol, it may not have become 
evident that some students had not established fully their understanding of the utility of 
scatterplots and split stacked dot plots. These results add support to the warning offered by 
Rubin et al. (2006) and Bakker (2002) that students may be overwhelmed when given 
access to software packages that give them access to various graphical representations. The 
students in this study did not appear to be overwhelmed by the features of TinkerPlots but 
when given the opportunity construct multiple graphical representations for the same data. 
The issue is that some of the students did not always choose graphs appropriate for 
answering the question asked. 


Conclusion 

The results reported in this paper are limited as the study only involved 12 students. 
They do, however, offer insights into the way in which students make sense of and 
interpret various exploratory data analysis representations. They also draw attention to the 
need to extend research to encompass student interpretation of non-traditional graphs made 
possible through innovative graphing software. A search of the literature did not reveal any 
research that explored specifically how students transfer understanding of statistical 
concepts to various graphical representations. To date, research has treated the exploration 
of student use of graph types discreetly (e.g. Watson & Donne, 2008) rather than focusing 
on the interconnectedness or not of various graph types. Further exploration of the 
development of key concepts such as data, distribution, centre, variability, outliers, 
sampling, and comparing groups (Biehler et al., 2013) within the context of multiple graph 
types, is required. Research has a role to play in developing models of learning that support 
teachers to counter the problems faced by students when transferring knowledge developed 
in one graphing context to other graphing contexts. An appreciation of the different 
thinking needed and used by students to interpret various graphical representations has the 
potential of empowering teachers to guide student learning towards an understanding of the 
particular learning outcomes targeted. Teachers will then be in a better position to choose 
the most appropriate data sets and graphical representations to explore statistical questions 
and promote understanding of statistical concepts. 
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